Appendix C — Assignment C

Instructions

  1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.

  2. Do not write your name on the assignment.

  3. Insert Code cells in the template provided to write solutions for the assignment. Do not open a new notebook, and work from scratch.

  4. Write your code in the Code cells, and text in the Markdown cells of the Jupyter notebook. Ensure that the solution is written neatly enough to understand and grade.

  5. Use Quarto to print the .ipynb file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: quarto render filename.ipynb --to html. Submit the HTML file.

  6. There are 5 points for clealiness and organization. The breakdow is as follows:

  • Must be an HTML file rendered using Quarto (1.5 pts).

  • There aren’t excessively long outputs of extraneous information (e.g. no printouts of unnecessary results without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.). There is no piece of unnecessary / redundant code, and no unnecessary / redundant text (1 pt)

  • The code follows the python style guide for naming variables, spaces, indentation, etc. (1 pt)

  • The code should be commented and clearly written with intuitive variable names. For example, use variable names such as number_input, factor, hours, instead of a,b, xyz, etc. For repetitive code chunks, either copy the comments or just leave a comment mentioning that the comment is the same as in the previous occurrece of the code chunk (1.5 pts)

  1. The assignment is worth 100 points, and is due on 3rd Feb 2024 at 11:59 pm.

  2. In C.1, C.2, and C.3, you are not allowed to use any kind of a loop outside list / dictionary comprehension.

C.1 List comprehension

Use list comprehension for all the questions below. You are not allowed to use loops outside the list comprehension.

USA’s GDP per capita from 1960 to 2021 is given by the tuple T in the code cell below. The values are arranged in ascending order of the year, i.e., the first value is for 1960, the second value is for 1961, and so on.

Code
T = (3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)

C.1.1

Use list comprehension to produce a list of the gaps between consecutive entries in T, i.e, the increase in GDP per capita with respect to the previous year. The list with gaps should look like: [60, 177, …]. Print the first five elements of the list, and the length of the list.

(4 points)

C.1.2

Use the list developed in C.1.1 to find the maximum gap size, i.e, the maximum increase in GDP per capita.

(1 point)

C.1.3

Using list comprehension with the list developed in C.1.1, find the percentage of gaps that have size greater than $1000.

(3 points)

C.1.4

Use list comprehension over the list developed in C.1.1 to print the list of years in which the GDP per capita increase was more than $2000.

Hint: The enumerate() function may help.

(4 points)

C.1.5

Use list comprehension to:

  1. Create a list that consists of the difference between the maximum and minimum GDP per capita values for each of the 5 year-periods starting from 1976, i.e., for the periods 1976-1980, 1981-1985, 1986-1990, …, 2016-2020.

  2. Find the five year period in which the difference (between the maximum and minimum GDP per capita values) was the least.

(4 + 2 points)

C.2 Nested list-comprehension

Below is the list consisting of the majors / minors of students of the course STAT303-1 Fall 2023. This data is a list of lists, where each sub-list (smaller list within the outer larger list) consists of the majors / minors of a student. Most of the students have majors / minors in one of these four areas:

  1. Math / Statistics / Computer Science

  2. Humanities / Communication

  3. Social Sciences / Education

  4. Physical Sciences / Natural Sciences / Engineering

There are some students having other areas as well.

Use list comprehension for all the questions below. You are not allowed to use loops outside the list comprehension.

Code
majors_minors = [['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Humanities / Communications',  'Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Humanities / Communications',  'Social Sciences / Education',  'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Physical Sciences / Natural Sciences / Engineering'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Cognitive Science'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Music'], ['Social Sciences / Education'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Data Science'], ['Social Sciences / Education'], ['Math / Statistics / Computer Science', 'jazz'], ['Humanities / Communications', 'Social Sciences / Education'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Humanities / Communications',  'Social Sciences / Education',  'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Econ'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', ''], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Humanities / Communications',  'Social Sciences / Education',  'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Humanities / Communications', 'Social Sciences / Education'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Social Sciences / Education'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Humanities / Communications', 'Social Sciences / Education'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education'], ['Humanities / Communications'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education',  'Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Humanities / Communications', 'Social Sciences / Education'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Humanities / Communications',  'Social Sciences / Education',  'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Humanities / Communications'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Social Sciences / Education',  'Physical Sciences / Natural Sciences / Engineering'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science', 'Music'], ['Physical Sciences / Natural Sciences / Engineering'], ['Humanities / Communications'], ['Physical Sciences / Natural Sciences / Engineering'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education', 'Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science'], ['Physical Sciences / Natural Sciences / Engineering'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Social Sciences / Education'], ['Physical Sciences / Natural Sciences / Engineering'], ['Physical Sciences / Natural Sciences / Engineering',  'Math / Statistics / Computer Science'], ['Math / Statistics / Computer Science'], ['Humanities / Communications', 'Math / Statistics / Computer Science','Social Sciences / Education','Physical Sciences / Natural Sciences / Engineering'], ['Humanities / Communications', 'Math / Statistics / Computer Science', 'Social Sciences / Education', 'Data Science']]

C.2.1

Which majors / minors don’t fall into any of these four areas?

(3 points)

C.2.2

How many students have Math / Statistics / Computer Science as an area of their major / minor?

(2 points)

C.2.3

How many students have Math / Statistics / Computer Science as the only area of their major / minor?

(3 points)

C.2.4

How many students have Math / Statistics / Computer Science and Social Sciences / Education as a couple of areas of their major / minor?

(4 points)

C.2.5

How many students have major / minor in at least three of the above mentioned four areas?

(5 points)

C.3 Ted Talks

Use only list / dictionary comprehensions in this question. You are not allowed to use loops outside of list / dictionary comprehension.

C.3.1

Download the file TED_Talks_data.json on ted talks, and put it in the same folder as your notebook. Read the file using the code below. You will get the data in the object TED_Talks_data. Just look at the data structure of TED_Talks_data. You will need to know how the data is structured in lists/dictionaries to answer the questions below.

(1 point)

Code
import json
with open("TED_Talks.json", "r") as file:
    TED_Talks_data=json.load(file)

C.3.2

Find the number of talks in the dataset.

(1 point)

C.3.3

Find the headline, speaker and year_filmed of the talk with the highest number of views.

(6 points)

C.3.4

What are the mean and median number of views for a talk? Can we say that the majority of talks (i.e., more than 50% of the talks) have less views than the average number of views for a talk? Justify your answer.

(5 points)

C.3.5

Do at least 25% of the talks have more views than the average number of views for a talk? Justify your answer.

(4 points)

C.3.6

Find the headline of the talk that received the highest number of votes in the Confusing category.

(8 points)

C.3.7

Find the headline and the year_filmed of the talk that received the highest percentage of votes in the Fascinating category.

\[\text{Percentage of } \textit{Fascinating} \text{ votes for a ted talk} = \frac{Number \ of \ votes \ in \ the \ Fascinating \ category \ }{Total \ votes \ in \ all \ categories}\]

(10 points)

C.4 Poker

The object deck defined below corresponds to a deck of cards. Estimate the probability that a five card hand will be:

  1. Five of a kind

  2. Straight flush

  3. Four of a kind

  4. Full house

  5. Flush

  6. Straight

  7. Three-of-a-kind

  8. Two-pair

  9. One-pair

  10. High card

You may check the meaning of the above terms here.

You must print the result as a table having two columns under headers Hand type and Chance. The column Hand Type should display the hand type such as flush, straight, etc., and the column Chance should display the probability that the hand is of a given type upto 3 places of decimal. For example, if the probability to have a flush is 30.564 %, then the 5th row of the table must have values Flush, 30.564 %. Also, show the sum of all the probabilities in the last row of the table.

Note that a hand can be classified only into one of the 10 categories, which will be category at the highest possible level in the hierarchy. For example, if a hand is a straight flush, then it is not a flush, and not a straight.

(25 points)

Hint:

Estimate these probabilities as follows.

  1. Write a function that accepts a hand of 5 cards as argument, and returns relevant characterisitics of a hand, such as the number of distinct card values, maximum occurences of a value etc. Using the values returned by this function (may be in a dictionary), you can compute if the hand is of any of the 10 types.

  2. Randomly pull a hand of 5 cards from the deck. Call the function developed in (1) to get the relevant characteristics of the hand. Use those characteristics to determine if the hand is one of the 10 types.

  3. Repeat (2) 100,000 times.

  4. Estimate the probability of the hand being of the above 10 mentioned types from the results of the 100,000 simulations.

You may use the function shuffle() from the library random to shuffle the deck everytime before pulling a hand of 5 cards. Or you may use the sample() function to randomly sample 5 cards from a deck without shuffling.

For printing results as a table, use the python function tabulate from the library tabulate

You don’t need to stick to the hint if you feel you have a better way to do it. In case you have a better way, you can claim 10 bonus points for this assignment.

Code
deck = [{'value':i, 'suit':c}
for c in ['spades', 'clubs', 'hearts', 'diamonds']
for i in range(2,15)]